In [1]:
from IPython.display import HTML
HTML('''
<script src="https://cdnjs.cloudflare.com/ajax/libs/
jquery/2.0.3/jquery.min.js "></script>
<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.jp-CodeCell > div.jp-Cell-inputWrapper').hide();
 } else {
$('div.jp-CodeCell > div.jp-Cell-inputWrapper').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" 
value="Click here to toggle on/off the raw code."></form>
''')
Out[1]:
In [2]:
import pandas as pd
import boto3
import zipfile
import geopandas as gpd
import folium
from folium import IFrame
from folium import MacroElement
from folium import plugins
from sklearn.preprocessing import MinMaxScaler
from branca.colormap import LinearColormap
import json
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.graph_objects as go

s3_client = boto3.client('s3')

import warnings
warnings.filterwarnings("ignore")
In [3]:
# Load some data, including the pre-processed files from EMR
countries_df = gpd.read_file('ne_50m_admin_0_countries.zip')
airports_df = gpd.read_file('ne_10m_airports.zip')

population_per_country = pd.read_json('population_per_country.json', 
                                      lines=True).T
population_per_country.columns = ['population']

subregion_df = pd.read_json('population_per_subregion.json', lines=True).T
subregion_df.columns = ['population']

countries_df = countries_df.merge(population_per_country, how='left', 
                                  left_on='NAME', right_index=True)
countries_df['population_density'] = countries_df.apply(
    lambda row: row['population'] / row.geometry.area, axis=1)
countries_df = gpd.GeoDataFrame(countries_df, geometry='geometry')

with open('population_df_0.json') as f:
    data = json.load(f)

study_area = json.loads(data)
population_df = gpd.GeoDataFrame.from_features(
    study_area["features"]).set_crs('epsg:4326')
In [4]:
# Additional pre-processing
json_data = pd.read_json('population_df_0.json', lines=True).loc[0][0]
data = json.loads(json_data)
population_values = [feature['properties']['total_population'] 
                     for feature in data['features']]

# Extract population values and coordinates from the data
populations = [feature["properties"]["total_population"] 
               for feature in data["features"]]
coordinates = [feature["geometry"]["coordinates"] 
               for feature in data["features"]]

Picture2-3.png

Executive Summary

This research focuses on the geospatial analysis of global air mobility using the High Resolution Population Density Maps + Demographic Estimates dataset, a refined version of data provided by the Center for International Earth Science Information Network (CIESIN), and the Natural Earth public domain map dataset.

The study confirms that developed regions with higher levels of economic prosperity and infrastructure development have well-established transportation networks, including airports. These airports play a vital role as gateways for international travel, facilitating trade, tourism, and business connections. Consequently, countries in these regions tend to have a higher concentration of airports and better connectivity.

On the other hand, less developed regions face challenges in terms of airport accessibility. Limited infrastructure investment, geographic constraints, and socioeconomic limitatimay ons contribute to fewer airports or less efficient transportation networks. As a result, countries in these regions experience reduced accessibility to airports, impacting their connectivity and potential economic opportunities.

The findings highlight the importance of airport infrastructure and its connection to a country's level of development. They emphasize the need for investment in transportation networks, particularly in less developed regions, to improve accessibility and foster economic growth. Understanding the geospatial dynamics of global air mobility provides valuable insights for policymakers and stakeholders involved in enhancing transportation infrastructure and promoting regional connect

The researchers recomends the increase of granularity of the population data, as well as the consideration of air travel restrictions and flights.ivity.

Background

Air transportation is an essential component of modern society, facilitating global connectivity, economic growth, and cultural exchange. However, access to air transportation is not uniformly distributed across the world. Certain regions, subregions, and countries face significant challenges in terms of connectivity and air mobility.

Unequal distribution of air transportation access has significant implications for tourism. Tourism relies heavily on efficient and accessible air travel, as it allows tourists to explore new destinations, experience different cultures, and contribute to local economies. Regions with limited air connectivity may struggle to attract tourists, resulting in missed economic opportunities and hindering the growth of the tourism industry. In contrast, areas with robust air transportation networks can benefit from increased tourist arrivals, generating revenue and employment opportunities.

Unequal distribution of air transportation access can have a profound impact on air traffic and the frequency of delays and cancellations. There were more than 300,000 arrival delays and over 60,000 cancelled flights in the US last year [1], and this could be partly due to uneven air travel opportunities in different regions in the country. Regions with limited air connectivity may experience lower air traffic volumes compared to areas with well-established airport networks. This disparity in air traffic can result in reduced frequency of flights to and from those regions, making it more challenging for travelers to access and depart from these areas. As a consequence, individuals and businesses located in regions with limited air transportation access may experience longer travel times, increased transit costs, and limited options for connecting flights.

Motivation

Unequal distribution of air transportation infrastructure and its impact on various aspects of society have become increasingly evident, especially during times of crisis such as the COVID-19 pandemic. Understanding the implications of unequal access to air transportation is crucial for identifying areas that require infrastructure improvements, policy interventions, and targeted investments. By conducting a comprehensive analysis of the spatial patterns and disparities in air transportation accessibility, this study aims to shed light on the regions, subregions, and countries that face challenges in accessing air transportation.

The COVID-19 pandemic has further emphasized the critical role of air transportation in global supply chains, emergency response, and economic development. The pandemic has disrupted air mobility and highlighted the vulnerability of regions with limited airport access. Studying the unequal distribution of airports and its impact on people during the pandemic provides valuable insights into the challenges faced by underserved regions in managing emergency response efforts, accessing critical supplies, and facilitating essential travel.

Moreover, understanding the implications of unequal distribution on air traffic and the frequency of delays is essential for identifying areas where infrastructure improvements are necessary. Analyzing the relationship between air transportation access and air traffic patterns can help policymakers identify regions that require investment in airport infrastructure, runway expansions, or the development of new airports. Enhancing the capacity and efficiency of airports in underserved regions can alleviate congestion, reduce delays, and improve the overall air travel experience for individuals and businesses operating in these areas.

By addressing the gaps in our understanding of the unequal distribution of airports and its impact on air transportation accessibility, this study aims to contribute to the development of evidence-based policies and interventions. The findings will help policymakers prioritize infrastructure improvements, connectivity enhancements, and regional transportation networks to promote more equitable access to air transportation. Ultimately, the study seeks to foster more inclusive and sustainable development by ensuring that all parts of the world population have access to the benefits of air transportation.

Problem Statement:

"What parts of the world population (regions, subregions, countries, etc.) have access to air transportation?"

Data Source:

High Resolution Population Density Maps + Demographic Estimates¶

The High Resolution Population Density Maps + Demographic Estimates dataset is a refined version of a dataset from the Center for International Earth Science Information Network (CIESIN). Managed by Meta, this dataset is a valuable resource that provides detailed information on population distribution and demographic estimates at a high-resolution spatial scale.

The dataset consists of CSV and Cloud-optimized GeoTIFF files that combine satellite imagery, land cover data, and demographic information to generate population density maps, which are essential for understanding the spatial distribution of human populations across different regions, subregions, and countries. These maps provide insights into population concentrations, urbanization patterns, and areas with high population densities. In this report, only the population information was used. The schema is shown in Figure 1A.

schema.png

Figure 1A. Schema of population dataset.

file_size.png

Figure 1B. File size of population dataset.

Natural Earth¶

Natural Earth is a public domain map dataset that provides geospatial data representing various physical and cultural features of the Earth's surface. It offers a range of vector and raster data layers at different scales, including country boundaries, coastlines, rivers, lakes, land cover, and administrative divisions. These data layers are carefully curated and designed to be visually appealing, accurate, and suitable for general-purpose mapping and analysis. The database is not traditionally structured as a database with a predefined schema,

The dataset is widely recognized for its high-quality and consistent cartographic representation, making it a valuable resource for researchers, analysts, and mapmakers. The data layers provided by Natural Earth are often used as a foundational reference for creating maps and performing geospatial analysis tasks in various domains, including geography, environmental science, urban planning, and social sciences.

Methodology:

pipeline.png

Figure 2A. High-level methodology.

The exploratory report follows a high-level methodology outlined in Figure 2A. The following items provides a more detailed explanation of each step, highlighting key processes involved.

  1. Data retrieval. The initial step involved retrieving the required datasets, namely the High Resolution Population Density Maps + Demographic Estimates dataset and the Natural Earth dataset, from the AWS Registry of Open Data. This process leveraged the capabilities of an AWS Elastic Map Reduce (EMR) environment, specifically a configured EMR cluster. The cluster configuration consisted of 1 primary node and 3 core nodes, each equipped with 4 cores and 16GiB memory. This distributed setup enabled efficient and parallel processing, ensuring optimal handling of the large volume of data.

instance.png

Figure 2B. Amazon EMR Instance.
  1. Data preprocessing. After obtaining the datasets, a preprocessing step was carried out to filter and transform the data. The purpose was to extract and keep only the relevant fields and values that directly contribute to addressing the problem. This filtering process helped streamline the dataset by removing unnecessary information and focusing on the essential components needed for further analysis. By carefully curating the dataset in this way, the study ensured a more focused and efficient approach to the subsequent analysis, improving the relevance and effectiveness of the research outcomes. For reference, the EMR notebook was attached as lab2-emr.ipynb.

  2. Data persistence. Once the relevant fields and values were identified, the next step involved converting the data into JSON files for export. This process made use of the boto3 library, which provided a seamless and efficient method for selective exporting of the dataset to an AWS S3 bucket. This approach facilitated the migration of the workflow to a local machine, enabling further analysis and processing. Exporting the data in JSON format played a crucial role in optimizing the workflow by eliminating the need to load the entire datasets, resulting in a more efficient and streamlined process.

  3. Exploratory analysis. The dataset underwent a comprehensive exploratory analysis to gain insights into its various fields and characteristics. Data profiling techniques were employed to assess the structure and quality of the dataset. This involved examining missing values, data types, and inconsistencies to identify areas that required data cleaning and preprocessing. Visualizations, such as bar graphs and choropleth maps, were utilized to depict the distribution and composition of the data across different categories or variables, providing a clearer understanding of the dataset's characteristics.

  4. Data interpretation. Through the comprehensive exploratory analysis, significant insights and observations were obtained, uncovering valuable trends, relationships, and potential areas of interest within the datasets. The interpretation of these findings played a pivotal role in shaping the conclusions and recommendations of the study, directly addressing the problem statement at hand. By drawing upon the insights derived from the exploratory analysis, the study was able to provide meaningful and actionable conclusions, as well as relevant recommendations for addressing the identified challenges and opportunities in the context of the research problem.

Results and Discussion

The section contains plots that were generated by utilizing the processed JSON files from the EMR cluster, as well as small Shapefiles that can be obtained from Natural Earth.

Choropleth Maps:

Figure 3 illustrates a choropleth map of the world, where different colors represent the number of airports in each country. Dark regions indicate countries with minimal or no airports, primarily found in developing nations in Africa, mountainous areas posing challenges for air travel, or unpopulated regions such as Greenland and Antarctica. In Figure 4, we observe the top 10 countries with the highest number of airports. The United States takes the lead by a significant margin, followed by India, Canada, Mexico, and China. These countries are known for their substantial population sizes and thriving economic activities, which contribute to their extensive airport infrastructure. The relationship between population, economy, and airport distribution will be further explored in succeeding plots. Figure 5 and Figure 6 provide aggregated representations of the previous plots based on subregions.

In [5]:
# Count number of airports per country
for country in list(countries_df.index):
    airports_df[country] = airports_df['geometry'].within(
        countries_df['geometry'][country]) 
countries_df['airports'] = airports_df.loc[:,0:241].sum()

# Plot choropleth map
fig, ax = plt.subplots(figsize=(12,5))
countries_df.plot(column='airports', cmap='plasma', legend=True, ax=ax);
Figure 3. Choropleth map of the number of airports per country.
In [6]:
# Read airports_per_country JSON file
country_airport_df = pd.read_json('airports_per_country.json', orient='index')

# Create bar plot of top 10 countries based on number of airports
fig, ax = plt.subplots(figsize=(10,5))
(countries_df[['NAME', 'airports']].set_index('NAME')
 .sort_values(by='airports', ascending=False)
 .head(10)
 .sort_values(by='airports')
 .plot
 .barh(legend=False, ax=ax))
plt.title('Top 10 countries with the most number of airports')
plt.xlabel('Number of airports')
plt.ylabel('');
Figure 4. Top 10 countries with the most number of airports.
In [7]:
subregion_df = gpd.GeoDataFrame(countries_df.dissolve(
    'SUBREGION')['geometry'])

# Read airports_per_subregion JSON file
airports_per_subregion = pd.read_json('airports_per_subregion.json', 
                                      orient='index')
subregion_df['airports'] = airports_per_subregion[0].tolist()

# Plot choropleth map
fig, ax = plt.subplots(figsize=(12,5))
subregion_df.plot(column='airports', cmap='plasma', legend=True, ax=ax);
Figure 5. Top 10 countries with the most number of airports.
In [8]:
# Create bar plot of subregions based on number of airports
fig, ax = plt.subplots(figsize=(10,5))
(airports_per_subregion
 .sort_values(by=0, ascending=False)
 .sort_values(by=0)
 .plot
 .barh(legend=False, ax=ax))
plt.title('Number of airports per subregion')
plt.xlabel('Number of airports');
Figure 6. Number of airports per subregion.

Population Heat Map:

Note: You can remove the airport icon by selecting the tile.

In [9]:
# Create a DataFrame for the heat map
df = pd.DataFrame({
    "Population": population_df['total_population'],
    "Latitude": [geom.y for geom in population_df['geometry']],
    "Longitude": [geom.x for geom in population_df['geometry']]
})

# Initialize the map at the mean latitude and longitude
center_lat = df["Latitude"].mean()
center_lon = df["Longitude"].mean()
m = folium.Map(location=[center_lat, center_lon], zoom_start=3)

# Create a heat map layer with population as intensity
heat_data = df[['Latitude', 'Longitude', 'Population']].values.tolist()
plugins.HeatMap(heat_data, min_opacity=0.4, max_val=df['Population'].max(), radius=15).add_to(m)

# Create a marker layer for the airplanes
airplane_layer = folium.FeatureGroup(name='Airports')

for index, row in airports_df.iterrows():
    name = row['name']
    geometry = row.geometry
    lat, lon = geometry.y, geometry.x
    folium.Marker([lat, lon],
                  popup=name,
                  icon=folium.Icon(icon="plane", prefix="fa",
                                   icon_color="rgba(255, 255, 255, 0.5)",
                                   icon_size=(.5, .5),
                                   opacity=0.8)).add_to(airplane_layer)

# Add the airplane layer to the map
airplane_layer.add_to(m)

# Create a layer control and add it to the map
folium.LayerControl().add_to(m)

# Display the map
m
Out[9]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Figure 7. Population heat map with overlaid airport locations.

Figure 7 presents a population heat map overlaid with airport locations. Notably, the dataset has missing information on the populations of significant countries such as China, Russia, and Australia.

The graph highlights a strong correlation between the frequency of airports, symbolized by the representation of planes, and the population density in the respective areas. Darker regions on the map, indicating higher population density, exhibit a more pronounced presence of airports. This correlation can be attributed to the interplay between population density and the demand for air transportation services. Areas with a greater concentration of people typically experience higher travel demands, both domestically and internationally. Consequently, these densely populated regions often witness a higher prevalence of airports and a greater frequency of flights.

Notably, two prominent hotspots in the graph are New York and Brazil. Both regions are renowned for their diversity, economic vibrancy, popular tourist attractions, and thriving sporting cultures. The significant presence of airports in these areas aligns with their dynamic nature and the substantial travel activities facilitated by the demand stemming from their diverse populations and bustling economies.

Number of airports in 50km, 100 km, 200km, 500km radius to each point

In [10]:
import warnings
warnings.filterwarnings('ignore')

airports_df_m = airports_df.to_crs("EPSG:32634")

population_df_m = population_df.to_crs("EPSG:32634")
population_df_m.geometry = population_df_m.geometry.buffer(50_000) 

population_df['num_airports_in_50km'] = [np.nan] * len(population_df)
for i in population_df.index:
    pop_comp = gpd.GeoSeries([population_df_m.geometry[i]] 
                             * len(airports_df_m)).set_crs('EPSG:32634')
    population_df.loc[i, 'num_airports_in_50km'] = pop_comp.contains(
        airports_df_m.geometry).sum()

population_df_m = population_df.to_crs("EPSG:32634")
population_df_m.geometry = population_df_m.geometry.buffer(100_000) 

population_df['num_airports_in_100km'] = [np.nan] * len(population_df)
for i in population_df.index:
    pop_comp = gpd.GeoSeries([population_df_m.geometry[i]] 
                             * len(airports_df_m)).set_crs('EPSG:32634')
    population_df.loc[i, 'num_airports_in_100km'] = pop_comp.contains(
        airports_df_m.geometry).sum()

population_df_m = population_df.to_crs("EPSG:32634")
population_df_m.geometry = population_df_m.geometry.buffer(200_000) 

population_df['num_airports_in_200km'] = [np.nan] * len(population_df)
for i in population_df.index:
    pop_comp = gpd.GeoSeries([population_df_m.geometry[i]] 
                             * len(airports_df_m)).set_crs('EPSG:32634')
    population_df.loc[i, 'num_airports_in_200km'] = pop_comp.contains(
        airports_df_m.geometry).sum()

population_df_m = population_df.to_crs("EPSG:32634")
population_df_m.geometry = population_df_m.geometry.buffer(500_000) 

population_df['num_airports_in_500km'] = [np.nan] * len(population_df)
for i in population_df.index:
    pop_comp = gpd.GeoSeries([population_df_m.geometry[i]] 
                             * len(airports_df_m)).set_crs('EPSG:32634')
    population_df.loc[i, 'num_airports_in_500km'] = pop_comp.contains(
        airports_df_m.geometry).sum()
In [11]:
joined_df = gpd.sjoin(population_df, countries_df, how="left", op="within")
In [12]:
airgroup_50 = joined_df.groupby(
    ['NAME'])['num_airports_in_50km'].mean().sort_values(ascending=False)
airgroup_100 = joined_df.groupby(
    ['NAME'])['num_airports_in_100km'].mean().sort_values(ascending=False)
airgroup_200 = joined_df.groupby(
    ['NAME'])['num_airports_in_200km'].mean().sort_values(ascending=False)
airgroup_500 = joined_df.groupby(
    ['NAME'])['num_airports_in_500km'].mean().sort_values(ascending=False)
In [13]:
# Create a bar plot using Plotly
fig = go.Figure(data=[go.Bar(x=airgroup_100.index, y=airgroup_500.values)])

# Set plot title and axis labels
fig.update_layout(
    title='Number of airports within 100km',
    xaxis_title='Country',
    yaxis_title='Average number of Airports',
    height=600
)

# Show the plot
fig.show()
Figure 8. Average number of airports within a 100-km radius in each country.
In [14]:
# Create a bar plot using Plotly
fig = go.Figure(data=[go.Bar(x=airgroup_50.index, y=airgroup_50.values)])

# Set plot title and axis labels
fig.update_layout(
    title='Number of airports within 50km',
    xaxis_title='Country',
    yaxis_title='Average number of Airports',
    height=600
)

# Show the plot
fig.show()
Figure 9. Average number of airports within a 100-km radius in each country.

Figure 8 and Figure 9 illustrate the average number of airports within a 100-km and 50-km radius, respectively, for each country. Analyzing these graphs reveals a prominent pattern where countries with easy accessibility to airports are predominantly located in developed regions. This observation suggests a correlation between airport accessibility and a country's level of development.

Developed regions, characterized by higher levels of economic prosperity and infrastructure development, tend to possess well-established transportation networks, including a robust network of airports. These airports serve as crucial gateways for international travel, promoting trade, tourism, and business connections. Consequently, countries in these regions often exhibit a higher concentration of airports and enjoy superior connectivity.

In contrast, less developed regions may encounter challenges in terms of airport accessibility. Factors such as limited infrastructure investment, geographic constraints, and socioeconomic limitations can contribute to fewer airports or less efficient transportation networks. As a result, countries in these regions may experience reduced accessibility to airports, which can impact their connectivity and limit potential economic opportunities.

Additionally, small landlocked regions benefit from utilizing the airport facilities of their surrounding areas. Given their lack of direct access to coastlines or international borders, these landlocked regions rely on nearby airports for international travel, trade, and connectivity.

It's important to note that while the observed trend suggests a link between airport accessibility and development, it does not imply a causal relationship. Development is a complex and multifaceted process influenced by various factors beyond airport accessibility alone. Nonetheless, the graph provides a visual representation of the distribution of airports and highlights the spatial patterns associated with developed regions.

In [15]:
economy_50 = joined_df.groupby(
    ['ECONOMY'])['num_airports_in_50km'].mean().sort_values(ascending=False)
# Create a bar plot using Plotly
fig = go.Figure(data=[go.Bar(x=economy_50.index, y=economy_50.values)])

# Set plot title and axis labels
fig.update_layout(
    title='Number of airports based on Economic Region',
    xaxis_title='Economic Region',
    yaxis_title='Average number of Airports',
    height=600
)

# Show the plot
fig.show()
Figure 10. Average number of airports within a 50-km radius in each economic region.

Figure 10 shows the average number of airports within a 50-km radius in each economic region. The comparison between G7 regions and their developed non-G7 counterparts highlights an interesting contrast. Despite having larger land areas, G7 regions may exhibit lower scores in terms of airport accessibility. This can be explained by the fact that G7 regions often encompass countries with diverse geographical characteristics, including vast rural areas, dense urban centers, and complex terrain. These factors can make it more challenging to achieve uniformly high airport accessibility across the entire region.

Conclusion

In conclusion, the analysis of the graph and the relationship between airport frequency, population density, and economic development highlights the importance of airport infrastructure for regional connectivity and economic growth. The presence of airports in densely populated areas indicates the demand for air travel and the need for efficient transportation networks to serve those populations. Developed regions with higher levels of economic prosperity tend to have better-established airport networks, providing greater connectivity and opportunities for trade, tourism, and business.

Conversely, less developed regions face challenges in terms of airport accessibility, which can hinder their connectivity and potential economic opportunities. Limited infrastructure investment, geographic constraints, and socioeconomic limitations contribute to the disparity in airport infrastructure and connectivity between developed and less developed regions.

These findings underscore the significance of strategic investments in airport infrastructure to enhance connectivity, promote regional development, and facilitate economic growth. Policymakers and stakeholders should consider the relationship between population density, economic development, and airport accessibility when planning and implementing transportation strategies. By addressing the infrastructure needs of less developed regions and improving airport connectivity, countries can bridge the gap and foster inclusive growth in air mobility, benefiting both local populations and the broader economy.

Recommendations

The following recommendations addresses the challenges that the team faced in the course of the study:

  1. Increase the granularity of population: The researchers faced a challenge with regards to the granularity of the population dataset. Population counts were aggregated sums of rounded latitude and longitude coordinates.
  2. Focus on specific subregion: This research focuses on world averages on a country level. Population data may have surprising insights when comparing the baselines within a subregion or country.
  3. Join with datasets on infrastructure, water sources, and topography: The distances used in this report are simply lines between two points. In-between these lines can be extreme topography like mountains, oceans, or congested cities with complex roads that reveal the true travel time needed to get to the nearest airport.

References

[1] U.S. Department of Transportation, Bureau of Transportation Statistics. (n.d.). On-Time Performance - Reporting Operating Carrier Flight Delays at a Glance. TranStats. Retrieved from https://www.transtats.bts.gov/homedrillchart.asp

[2] Independent. (2023, April 5). France strike: Flights and Eurostar trains cancelled ahead of next major walkout. The Independent. Retrieved June 11, 2023, from https://www.independent.co.uk/travel/news-and-advice/france-strike-flights-eurostar-cancelled-b2314453.html

In [ ]: